BioAR: Anaphora Resolution For Relating Protein Names To Proteome Database Entries
نویسندگان
چکیده
The need for associating, or grounding, protein names in the literature with the entries of proteome databases such as Swiss-Prot is well-recognized. The protein names in the biomedical literature show a high degree of morphological and syntactic variations, and various anaphoric expressions including null anaphors. We present a biomedical anaphora resolution system, BioAR, in order to address the variations of protein names and to further associate them with Swiss-Prot entries as the actual entities in the world. The system shows the performance of 59.5% 75.0% precision and 40.7% 56.3% recall, depending on the specific types of anaphoric expressions. We apply BioAR to the protein names in the biological interactions as extracted by our biomedical information extraction system, or BioIE, in order to construct protein pathways automatically.
منابع مشابه
BioThesaurus: a web-based thesaurus of protein and gene names
UNLABELLED BioThesaurus is a web-based system designed to map a comprehensive collection of protein and gene names to protein entries in the UniProt Knowledgebase. Currently covering more than two million proteins, BioThesaurus consists of over 2.8 million names extracted from multiple molecular biological databases according to the database cross-references in iProClass. The BioThesaurus web s...
متن کاملThe Effect of Anaphor and Ellipsis Resolution on Proximity Searching in a Text Database
So far, methods for ellipsis and anaphor resolution have been developed and the effects of anaphor resolution have been analyzed in the context of statistical information retrieval (IR) of scientific abstracts. No significant improvement has been observed. In this study, the effects of ellipsis and anaphor resolution on proximity searching in a full text database are analyzed. Anaphora and elli...
متن کاملI-3: Human Y Chromosome Proteome Project 2012 Update
The Human Genome Project has generated a blueprint for the approximately 20,300 gene-encoded proteins potentially active in any of 230 cell types that make up the human body (human proteome). However, based on the UniProtKB/Swiss-Prot database content, about 6000 of at the protein level; for many others, there is very little information related to protein function, abundance, subcellular locali...
متن کاملQuantitative Assessment of Dictionary-based Protein Named Entity Tagging
Objective: Natural language processing (NLP) approaches have been explored to manage and mine information recorded in biological literature. A critical step for biological literature mining is biological named entity tagging (BNET) that identifies names mentioned in text and normalizes them with entries in biological databases. The aim of this study was to provide quantitative assessment of the...
متن کاملResearch Paper: Quantitative Assessment of Dictionary-based Protein Named Entity Tagging
OBJECTIVE Natural language processing (NLP) approaches have been explored to manage and mine information recorded in biological literature. A critical step for biological literature mining is biological named entity tagging (BNET) that identifies names mentioned in text and normalizes them with entries in biological databases. The aim of this study was to provide quantitative assessment of the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004